This report is submitted by Greeshma Jeev Koothuparambil and Olayemi Morrison as a part of Laboratory 6 of Visualization (732A98) Course for the 2023 Autumn Semester.

Assignment 1

Following are the libraries used for the successful completion of this assignment:
tidytext
dplyr
tidyr
readr
textdata
wordcloud
RColorBrewer
plotly
ggplot2
visNetwork

Here is how we loaded our libraries:

library(tidytext)
library(dplyr)
library(tidyr)
library(readr)
library(textdata)
library(wordcloud)
library(RColorBrewer)
library(plotly)
library(ggplot2)
library(visNetwork)

Reading the the file Five.txt and OneTwo.txt

#Read the file
text5=read_lines("Five.txt")
text12 = read_lines("OneTwo.txt")

1. Visualize word clouds corresponding to Five.txt and OneTwo.txt and make sure that stop words are removed. Which words are mentioned most often?

## first plot

#color pallette
pal <- brewer.pal(8,"Dark2")

## Working on Five data
textFrame5=tibble(text=text5)%>%mutate(line = row_number())
#one-token-per-row
tidy_frame5=textFrame5%>%unnest_tokens(word, text)
#removing stopwords
tidy_frame51=tidy_frame5%>%anti_join(stop_words, by="word")
#count word
tidy_framecount5=tidy_frame51%>%count(word, sort=TRUE)

## Working on one two data
textFrame12=tibble(text=text12)%>%mutate(line = row_number())
#one-token-per-row
tidy_frame12=textFrame12%>%unnest_tokens(word, text)
#removing stopwords
tidy_frame121=tidy_frame12%>%anti_join(stop_words, by="word")
#count word
tidy_framecount12=tidy_frame121%>%count(word, sort=TRUE)

The word cloud for Five.txt looks like this:

The word cloud for OneTwo.txt looks like this:

Analysis

In the world cloud of data five it is the word watch which is mentioned the most followed by time, casio, price, watches, band, digital and nice. The word cloud of data onetwo as well show watch as the most repeated word followed by time and casio.


2. Without filtering stop words, compute TF-IDF values for OneTwo.txt by aggregating each 10 lines into a separate “document”. Afterwards, compute mean TF-IDF values for each word over all documents and visualize them by the word cloud. Compare the plot with the corresponding plot from step 1. What do you think the reason is behind word “watch” being not emphasized in TF-IDF diagram while it is emphasized in the previous word clouds?

# second plot

#tf-idf, treating 10 lines as separate document
tidy_frame12tfidf=textFrame12%>%unnest_tokens(word, text)%>%
  mutate(line1=floor(line/10))%>%
  count(line1,word, sort=TRUE)

TFIDF12=tidy_frame12tfidf%>%bind_tf_idf(word, line1, n)
tfidfmean12 <- TFIDF12%>% group_by(word) %>% summarise(tfidfmean = mean(tf_idf))

The word cloud with tfidf mean look like this:

Analysis

We are using tfidf in the data processing which bring out the important words in the document. The plot says that according to tfidf algorithm words like bad, luminesence, shockingly and could are the most important words. These are followed by words like actual, loud, hour and perfect. The algorithm is emphasising on the importance of a word rather than its number of occurrences. This is why the word watch is losing its priority in the plot.


3. Aggregate data in chunks of 5 lines and compute sentiment values (by using “afinn” database) for respective chunks in Five.txt and for OneTwo.txt . Produce plots visualizing aggregated sentiment values versus chunk index and make a comparative analysis between these plots. Does sentiment analysis show a connection of the corresponding documents to the kinds of reviews we expect to see in them?

# third plot

#sentiment, treating 5 lines as separate document

tidy_frame12senti=textFrame12%>%unnest_tokens(word, text)%>%
  left_join(get_sentiments("afinn"))%>%
  mutate(line1=floor(line/5))%>%
  group_by(line1, sort=TRUE)%>%
  summarize(Sentiment=sum(value, na.rm = T))

p1 <- plot_ly(tidy_frame12senti, x=~line1, y=~Sentiment)%>%add_bars()


tidy_frame5senti=textFrame5%>%unnest_tokens(word, text)%>%
  left_join(get_sentiments("afinn"))%>%
  mutate(line1=floor(line/5))%>%
  group_by(line1, sort=TRUE)%>%
  summarize(Sentiment=sum(value, na.rm = T))

p2 <- plot_ly(tidy_frame5senti, x=~line1, y=~Sentiment)%>%add_bars()

The sentiment graph for Five.txt is as follows:

The sentiment graph for OneTwo.txt is as follows:

Analysis

As for the plot for five data, the sentiment values range from 1 to 60. The graph clearly exposes the interest expressed in the data.

As for the plot for onetwo data, the sentiment values range from -22 to 29. The graph, unfortunately, does not add up to the real sentiment of the data. For most of the lines, the plot gives out a positive reaction even though most of the reviews are negative in nature. This could be because of the omission of words like caution, quit, replace, unfortunately and so forth.


4. Create the phrase nets for Five.Txt and One.Txt with connector words
• am, is, are, was, were
• at
When you find an interesting connection between some words, use Word Trees https://www.jasondavies.com/wordtree/ to understand the context better. Note that this link might not work properly in Microsoft Edge (if you are using Windows 10) so use other browsers.

# Fourth plot

phraseNet=function(text, connectors){
  textFrame=tibble(text=paste(text, collapse=" "))
  tidy_frame3=textFrame%>%unnest_tokens(word, text, token="ngrams", n=3)
  tidy_frame3
  tidy_frame_sep=tidy_frame3%>%separate(word, c("word1", "word2", "word3"), sep=" ")
  
  #SELECT SEPARATION WORDS HERE: now "is"/"are"
  tidy_frame_filtered=tidy_frame_sep%>%
    filter(word2 %in% connectors)%>%
    filter(!word1 %in% stop_words$word)%>%
    filter(!word3 %in% stop_words$word)
  tidy_frame_filtered
  
  edges=tidy_frame_filtered%>%count(word1,word3, sort = T)%>%
    rename(from=word1, to=word3, width=n)%>%
    mutate(arrows="to")
  
  right_words=edges%>%count(word=to, wt=width)
  left_words=edges%>%count(word=from, wt=width)
  
  #Computing node sizes and in/out degrees, colors.
  nodes=left_words%>%full_join(right_words, by="word")%>%
    replace_na(list(n.x=0, n.y=0))%>%
    mutate(n.total=n.x+n.y)%>%
    mutate(n.out=n.x-n.y)%>%
    mutate(id=word, color=brewer.pal(9, "Blues")[cut_interval(n.out,9)],  font.size=40)%>%
    rename(label=word, value=n.total)
  
  #FILTERING edges with no further connections - can be commented
  edges=edges%>%left_join(nodes, c("from"= "id"))%>%
    left_join(nodes, c("to"="id"))%>%
    filter(value.x>1|value.y>1)%>%select(from,to,width,arrows)
  
  nodes=nodes%>%filter(id %in% edges$from |id %in% edges$to )
  
  visNetwork(nodes,edges)
  
}

phrase5 <- phraseNet(text5, c("is", "are", "am", "was","were","at"))
phrase12 <- phraseNet(text12, c("is", "are", "am", "was","were","at"))

The resulting visNetwork for Five.txt is as follows:

The resulting visNetwork for OneTwo.txt is as follows:

Analysis

Most associated words are ‘watch’, ‘time’, ‘easy’ and ‘night’ according to the phrase net of data five. The graph shows a connection between the nodes ‘time’ and ‘night’ as well. But when compared to the word tree created, it is noticeable that most of the associated words are missing in the word tree. For instance, the word tree algorithm is starting the tree with the word “this” as the root word. Therefore it misses on the a highly associated word ‘unbeatable’ since the sentence involving the word ‘unbeatable’ is “At $49 this watch is unbeatable” which does not start with the word ‘this’.
Word Tree for Five.txt
The phrase net of onetwo data says that most associated words are watch, keeping, alarm, night and display. The word nodes ‘watch’ and ‘alarm’ are associated with the word ‘defective’. Although most of the phrase links make sense, some remain senseless. For example, the word ‘watch’ is associated with the word ‘supposed’ which does not add any meaning to the analysis. The said sentence from the document is as follows : “This watch was supposed to be great to my soccer game day but after I wear it for a week it doesn’t work good enough.” same goes for the case of the link between ‘alarm’ and ‘practically’ where the sentence is “As time went by, it became unmanageable, and the alarm is practically unusable, the button for the chronometer, I think the electronic panel got a bit confused, you know, these watches sometimes get to be stored in a warehouse for months at a time, even over a year, so when you get it you think it’s brand new, but maybe it’s been stored for a year and half, God knows.” The word cloud seems a little bit harder to manage compared to phrase net, because the relationships formed by words ‘at’ are harder to find in the wordtree. Also it is difficult to find relation nodes like the word ‘defective’ which connects ‘alarm,’ and ‘watch’.
Word Tree for OneTwo.txt


5. Based on the graphs obtained in step 4, comment on the most interesting findings, like:
• Which properties of this watch are mentioned mostly often?
The most common discussion topics are performance of the watch, the time, display, alarm,ease of usage.

• What are satisfied customers talking about?
The satisfied customers talks about the features of the watch, its durability, functionalities and the place they got the watch from

• What are unsatisfied customers talking about?
The unsatisfied customers talk about the display and alarm features of the watch.

• What are properties of the watch mentioned by both groups?
Both graphs talk about the watch, their display and the nighttime features.

• Can you understand watch characteristics (like size of display, features of the watches) by observing these graphs?
Yes , we can understand about display, night time features, areas of defects and the feelings of customers about the watch.


Assignment 2

Following are the libraries used for the successful completion of this assignment:
plotly
crosstalk
tidyr
GGally

Here is how we loaded our libraries:

library(plotly)
library(crosstalk)
library(tidyr)
library(GGally)

Reading the the file olive.csv in to the dataframe

olive <- read.csv("olive.csv")

o1 <- SharedData$new(olive)

1. Create an interactive scatter plot of the eicosenoic against linoleic. You have probably found a group of observations having unusually low values of eicosenoic. Hover on these observations to find out the exact values of eicosenoic for these observations.

#Question 2.1 - creating an interactive scatterplot
scatterOlive <- plot_ly(o1, x = ~linoleic, y = ~eicosenoic) %>%
  add_markers(color = I("black"))

The scatterplot looks like this:

Analysis

While hovering on the observations with low values of eicosenoic, the exact values for these observations are identified to be 1, 2 or 3.

knitr::include_graphics("Q2-1.png")


2. Link the scatterplot of (eicosenoic, linoleic) to a bar chart showing Region and a slider that allows to filter the data by the values of stearic. Use persistent brushing to identify the regions that correspond unusually low values of eicosenoic. Use the slider and describe what additional relationships in the data can be found by using it. Report which interaction operators were used in this step.

#Question 2.2 - creating the linked bar graph
barOlive <-plot_ly(o1, x=~as.factor(Region))%>%add_histogram()%>%layout(barmode="overlay")


p01 <- bscols(widths=c(2, NA),filter_slider("STE", "Stearic", o1, ~stearic)
       ,subplot(scatterOlive,barOlive)%>%
         highlight(on="plotly_selected", dynamic=T, persistent = T, opacityDim = I(1))%>%hide_legend())

The linked bar graph looks like this:

Analysis

Through persistent brushing, the identified that data points with low values of eicosenoic primarily belong to regions 2 and 3.

knitr::include_graphics("Q2-2.png")

There are also observed changes in the distribution of the data as the slider is adjusted, causing some points to disappear and altering the y-axis values for both the scatter plot and the bar graph. This indicates that the filter is effective in hiding data points that do not meet the specified stearic criteria. On both the scatter plot and the bar graph, as the value of stearic is decreased on the slider, the range of values on the y-axis reduces. For instance, the initial y-axis range for the scatter plot was 0 to 60, but it changes to 0 to 40 as we move the slider leftward. Similarly, the bar graph’s y-axis range decreases from 0 to 300 to 0 to 150. This implies that the filtered data subset has lower eicosenoic values.

knitr::include_graphics("Q2-2b.png")

The selection operator was used to highlight and draw attention to data points that exhibited low y-values. This facilitated the identification of patterns or trends associated with regions 2 and 3, where the majority of low y-values were observed.

The connection operator was used to connect the scatter plot with the bar graph showing regions (1, 2, and 3), allowing us to simultaneously explore and understand relationships between variables. For instance, when selecting data points with low eicosenoic values in the scatter plot, the corresponding bars in the bar graph representing regions 2 and 3 were highlighted. This emphasizes the association of low eicosenoic values with specific regions, providing context.

The filtering provided the exploration of the dataset with a focus on specific stearic conditions. By adjusting the slider, the data is filtered to meet certain conditions, such as selecting data points where stearic was below a certain threshold. This dynamic filtering allows us observe how changes in stearic impact the distribution of data points on both the scatter plot and the bar graph’s y-axis..


3. Create linked scatter plots eicosenoic against linoleic and arachidic against linolenic. Which outliers in (arachidic, linolenic) are also outliers in (eicosenoic, linoleic)? Are outliers grouped in some way? Use brushing to demonstrate your findings.

#Question 2.3 - Create linked scatter plots
scatterOlive2 <- plot_ly(o1, x = ~linoleic, y = ~arachidic) %>%
  add_markers(color = I("black"))

p02 <- subplot(scatterOlive,scatterOlive2)%>%
  highlight(on="plotly_selected", dynamic=T, persistent=T, opacityDim = I(1))%>%hide_legend()

Thelinked scatter plot is as follows:

Analysis

Upon further observation of the two scatter plots with linoleic as the common factor, there is a discrepancy in the grouping of the outliers.
In Plot 1 (eicosenoc vs linoleic), a distinct cluster of data points with low values on the y-axis is easily identifiable. However, when we examine the corresponding data points on the y-axis in Plot 2, we do not observe a similar clustering pattern. Instead, the data points seem to be distributed throughout the plot, with only a small concentration of low values. This discrepancy may be attributed to the influence of additional variables such as geographical location, as we have a variable named Area on the original dataset. This or other factors could affect the distribution of data points on the second plot.

knitr::include_graphics("Q2-3.png")


4. Create a parallel coordinate plot for the available eight acids, a linked 3d-scatter plot in which variables are selected by three additional drop boxes and a linked bar chart showing Regions. Use persistent brushing to mark each region by a different color.Observe the parallel coordinate plot and state which three variables (let’s call them influential variables) seem to be mostly reasonable to pick up if one wants to differentiate between the regions. Does the parallel coordinate plot demonstrate that there are clusters among the observations that belong to the same Region? Select the three influential variables in the drop boxes and observe in the 3d-plot whether each Region corresponds to one cluster.

#Question 2.4 - Parallel coords
p<-ggparcoord(olive, columns = c(4:11))

d<-plotly_data(ggplotly(p))%>%group_by(.ID)
d1<-SharedData$new(d, ~.ID, group="olive")
p1<-plot_ly(d1, x=~variable, y=~value)%>%
  add_lines(line=list(width=0.3))%>%
  add_markers(marker=list(size=0.3),
              text=~.ID, hoverinfo="text")


olive2=olive
olive2$.ID=1:nrow(olive)
d2<-SharedData$new(olive2, ~.ID, group="olive")
p2<-plot_ly(d2, x=~factor(Region) )%>%add_histogram()%>%layout(barmode="stack")


ButtonsX=list()
for (i in 4:11){
  ButtonsX[[i-3]]= list(method = "restyle",
                        args = list( "x", list(olive[[i]])),
                        label = colnames(olive)[i])
}

ButtonsY=list()
for (i in 4:11){
  ButtonsY[[i-3]]= list(method = "restyle",
                        args = list( "y", list(olive[[i]])),
                        label = colnames(olive)[i])
}

ButtonsZ=list()
for (i in 4:11){
  ButtonsZ[[i-3]]= list(method = "restyle",
                        args = list( "z", list(olive[[i]])),
                        label = colnames(olive)[i])
}

p3 <- plot_ly(d2,x=~stearic,y=~oleic,z=~linoleic)%>%add_markers() %>%
  layout(xaxis=list(title=" "), yaxis=list(title=" "), zaxis=list(title=" "),
         title = "Select variable:",
         updatemenus = list(
           list(y=0.9, buttons = ButtonsX),
           list(y=0.6, buttons = ButtonsY),
           list(y=0.3, buttons = ButtonsZ)
         )  )

ps<-htmltools::tagList(p1%>%
                         highlight(on="plotly_selected", dynamic=T, persistent = T, opacityDim = I(1))%>%
                         hide_legend(),
                       p2%>%
                         highlight(on="plotly_selected", dynamic=T, persistent = T, opacityDim = I(1))%>%
                         hide_legend(), 
                       p3%>%
                         highlight(on="plotly_selected", dynamic=T, persistent = T, opacityDim = I(1))%>%
                         hide_legend()
)
htmltools::browsable(ps)

The parallel cords plot is as shown

Analysis

From the observation of the parallel coordinate plot, the most influential variables are highlighted below:

Region 1 - eicosenoic is clearly the influential variable, as we can easily identify the region through this variable.

Region 2 - oleic is the influential variable for identifying this region

Region 3 - linoelic is the influential variable here, although it is noted that eicosenoic is also able to identify this region.

knitr::include_graphics("Q2-4parcoords.png")

Within the linked visualizations, there is a consistent alignment between regions in the bar graph and the parallel coordinate plot. There are no instances of different clusters within the same region. This is confirmed on the 3d scatter plot when the influential variables are selected in the dropdown menu. Each region corresponds to one cluster.

knitr::include_graphics("Q2-4barplot.png")

knitr::include_graphics("Q2-4scatter.png")


5. Think about which interaction operators are available in step 4 and what interaction operands they are be applied to. Which additional interaction operators can be added to the visualization in step 4 to make it even more efficient/flexible? Based on the analysis in the previous steps, try to suggest a strategy (or, maybe, several strategies) that would use information about the level of acids to discover which regions different oils comes from.

The following are the interactive operators applied on their respective interactive operands:

Selection operator is applied to bar chart using the highlight(on = “plotly_selected” option. The selection of the bar is triggered through a rectangular or lasso drag mode. The bar is then coloured based on the brush colour selected.

Reconfiguring operator is applied to the 3d scatter plot. This transforms the data by changing the variables of the x, y and z axis from the dropdown menu.

Connection operator is applied to all 3 plots, from the parallel coordinate plot, to the bar graph and the 3d scatter plot.

An additional interaction operator that could be included is the filtering operator on the parallel coordinates, this can be utilized in such a way that links which don’t meet the specific criteria disappear, leaving only the selected ones.

Here are two strategies for determining the region of origin for different oils based on the levels of acids:

Parallel Coordinate Plot Clusters: Given that it is possible for the parallel coordinate plot to highlight regions based on persistent brushing, we can further refine this by creating a parallel coordinate plot where the lines connecting data points are color-coded based on region. Then, using Plotly’s interactivity, allow users to select specific oils or data points in the parallel coordinate plot. When a user selects an oil, highlight its acid profile on the 3D scatter plot. This approach helps users visually explore which region-specific acid profiles align with the selected oil, aiding in region identification.

3D Scatter Plot Exploration: When users hover over data points representing oils in the scatter plot, we can display information about the acid levels and the region of origin. Using plotly’s interactive features allow users to explore the scatter plot and quickly identify which region specific oils belong to based on their acid profiles.


STATEMENT OF CONTRIBUTION

For the first assignment coding and analysis was done by Greeshma Jeev, while for the second assignment, coding and analysis part was done by Olayemi. We both went through the outputs and the analysis to make our own suggestions to the results in order to make this report a grand success.

The RMD file was designed together and coded by both Greeshma and Olayemi. Content writing was done by both Olayemi and Greeshma Jeev.


APPENDIX

Code for Assignment 1 (Five.txt and OneTwo.txt Data)

# Set the working directory
setwd("R/")

#Read the libraries

library(tidytext)
library(dplyr)
library(tidyr)
library(readr)
library(textdata)
library(wordcloud)
library(RColorBrewer)
library(plotly)
library(ggplot2)
library(visNetwork)

#Read the file
text5=read_lines("Five.txt")
text12 = read_lines("OneTwo.txt")

## first plot

#color pallette
pal <- brewer.pal(8,"Dark2")

## Working on Five data
textFrame5=tibble(text=text5)%>%mutate(line = row_number())
#one-token-per-row
tidy_frame5=textFrame5%>%unnest_tokens(word, text)
#removing stopwords
tidy_frame51=tidy_frame5%>%anti_join(stop_words, by="word")
#count word
tidy_framecount5=tidy_frame51%>%count(word, sort=TRUE)
#wordcloud 5
tidy_framecount5%>%
  with(wordcloud(word, n, max.words = 100, colors=pal, random.order=F))

## Working on one two data
textFrame12=tibble(text=text12)%>%mutate(line = row_number())
#one-token-per-row
tidy_frame12=textFrame12%>%unnest_tokens(word, text)
#removing stopwords
tidy_frame121=tidy_frame12%>%anti_join(stop_words, by="word")
#count word
tidy_framecount12=tidy_frame121%>%count(word, sort=TRUE)
#wordcloud 12
tidy_framecount12%>%
  with(wordcloud(word, n, max.words = 100, colors=pal, random.order=F))


# second plot

#tf-idf, treating 10 lines as separate document
tidy_frame12tfidf=textFrame12%>%unnest_tokens(word, text)%>%
  mutate(line1=floor(line/10))%>%
  count(line1,word, sort=TRUE)

TFIDF12=tidy_frame12tfidf%>%bind_tf_idf(word, line1, n)
tfidfmean12 <- TFIDF12%>% group_by(word) %>% summarise(tfidfmean = mean(tf_idf))

#wordcloud TFIDF12
tfidfmean12 %>%  with(wordcloud(word, tfidfmean, max.words = 100, colors=pal, random.order=F,scale = c(2, 0.5)))



# third plot

#sentiment, treating 5 lines as separate document

tidy_frame12senti=textFrame12%>%unnest_tokens(word, text)%>%
  left_join(get_sentiments("afinn"))%>%
  mutate(line1=floor(line/5))%>%
  group_by(line1, sort=TRUE)%>%
  summarize(Sentiment=sum(value, na.rm = T))

p1 <- plot_ly(tidy_frame12senti, x=~line1, y=~Sentiment)%>%add_bars()


tidy_frame5senti=textFrame5%>%unnest_tokens(word, text)%>%
  left_join(get_sentiments("afinn"))%>%
  mutate(line1=floor(line/5))%>%
  group_by(line1, sort=TRUE)%>%
  summarize(Sentiment=sum(value, na.rm = T))

p2 <- plot_ly(tidy_frame5senti, x=~line1, y=~Sentiment)%>%add_bars()

# Fourth plot

phraseNet=function(text, connectors){
  textFrame=tibble(text=paste(text, collapse=" "))
  tidy_frame3=textFrame%>%unnest_tokens(word, text, token="ngrams", n=3)
  tidy_frame3
  tidy_frame_sep=tidy_frame3%>%separate(word, c("word1", "word2", "word3"), sep=" ")
  
  #SELECT SEPARATION WORDS HERE: now "is"/"are"
  tidy_frame_filtered=tidy_frame_sep%>%
    filter(word2 %in% connectors)%>%
    filter(!word1 %in% stop_words$word)%>%
    filter(!word3 %in% stop_words$word)
  tidy_frame_filtered
  
  edges=tidy_frame_filtered%>%count(word1,word3, sort = T)%>%
    rename(from=word1, to=word3, width=n)%>%
    mutate(arrows="to")
  
  right_words=edges%>%count(word=to, wt=width)
  left_words=edges%>%count(word=from, wt=width)
  
  #Computing node sizes and in/out degrees, colors.
  nodes=left_words%>%full_join(right_words, by="word")%>%
    replace_na(list(n.x=0, n.y=0))%>%
    mutate(n.total=n.x+n.y)%>%
    mutate(n.out=n.x-n.y)%>%
    mutate(id=word, color=brewer.pal(9, "Blues")[cut_interval(n.out,9)],  font.size=40)%>%
    rename(label=word, value=n.total)
  
  #FILTERING edges with no further connections - can be commented
  edges=edges%>%left_join(nodes, c("from"= "id"))%>%
    left_join(nodes, c("to"="id"))%>%
    filter(value.x>1|value.y>1)%>%select(from,to,width,arrows)
  
  nodes=nodes%>%filter(id %in% edges$from |id %in% edges$to )
  
  visNetwork(nodes,edges)
  
}


phrase5 <- phraseNet(text5, c("is", "are", "am", "was","were","at"))
phrase12 <- phraseNet(text12, c("is", "are", "am", "was","were","at"))

Code for Assignment 2 (Olive Data)

# loading the libraries

library(plotly)
library(crosstalk)
library(tidyr)
library(GGally)

#Read the Olive Data
olive <- read.csv("olive.csv")

o1 <- SharedData$new(olive)


#Question 2.1 - creating an interactive scatterplot
scatterOlive <- plot_ly(o1, x = ~linoleic, y = ~eicosenoic) %>%
  add_markers(color = I("black"))

scatterOlive


#Question 2.2 - creating the linked bar graph
barOlive <-plot_ly(o1, x=~as.factor(Region))%>%add_histogram()%>%layout(barmode="overlay")


p01 <- bscols(widths=c(2, NA),filter_slider("STE", "Stearic", o1, ~stearic)
       ,subplot(scatterOlive,barOlive)%>%
         highlight(on="plotly_selected", dynamic=T, persistent = T, opacityDim = I(1))%>%hide_legend())


#Question 2.3 - Create linked scatter plots

scatterOlive2 <- plot_ly(o1, x = ~linoleic, y = ~arachidic) %>%
  add_markers(color = I("black"))

p02 <- subplot(scatterOlive,scatterOlive2)%>%
  highlight(on="plotly_selected", dynamic=T, persistent=T, opacityDim = I(1))%>%hide_legend()



#Question 2.4 - Parallel coords


p<-ggparcoord(olive, columns = c(4:11))

d<-plotly_data(ggplotly(p))%>%group_by(.ID)
d1<-SharedData$new(d, ~.ID, group="olive")
p1<-plot_ly(d1, x=~variable, y=~value)%>%
  add_lines(line=list(width=0.3))%>%
  add_markers(marker=list(size=0.3),
              text=~.ID, hoverinfo="text")


olive2=olive
olive2$.ID=1:nrow(olive)
d2<-SharedData$new(olive2, ~.ID, group="olive")
p2<-plot_ly(d2, x=~factor(Region) )%>%add_histogram()%>%layout(barmode="stack")


ButtonsX=list()
for (i in 4:11){
  ButtonsX[[i-3]]= list(method = "restyle",
                        args = list( "x", list(olive[[i]])),
                        label = colnames(olive)[i])
}

ButtonsY=list()
for (i in 4:11){
  ButtonsY[[i-3]]= list(method = "restyle",
                        args = list( "y", list(olive[[i]])),
                        label = colnames(olive)[i])
}

ButtonsZ=list()
for (i in 4:11){
  ButtonsZ[[i-3]]= list(method = "restyle",
                        args = list( "z", list(olive[[i]])),
                        label = colnames(olive)[i])
}

p3 <- plot_ly(d2,x=~stearic,y=~oleic,z=~linoleic)%>%add_markers() %>%
  layout(xaxis=list(title=" "), yaxis=list(title=" "), zaxis=list(title=" "),
         title = "Select variable:",
         updatemenus = list(
           list(y=0.9, buttons = ButtonsX),
           list(y=0.6, buttons = ButtonsY),
           list(y=0.3, buttons = ButtonsZ)
         )  )

ps<-htmltools::tagList(p1%>%
                         highlight(on="plotly_selected", dynamic=T, persistent = T, opacityDim = I(1))%>%
                         hide_legend(),
                       p2%>%
                         highlight(on="plotly_selected", dynamic=T, persistent = T, opacityDim = I(1))%>%
                         hide_legend(), 
                       p3%>%
                         highlight(on="plotly_selected", dynamic=T, persistent = T, opacityDim = I(1))%>%
                         hide_legend()
)
htmltools::browsable(ps)